Parallel Parsing of Spoken Language
نویسندگان
چکیده
Randall A. Helzerman, Mary P. Harper, and Carla B. Zoltowski School of Electrical Engineering Purdue University West Lafayette, IN 47907 Abstract We have extended Maruyama's [4, 2, 3] constraint dependency grammar (CDG) to process a lattice of sentence hypotheses instead of separate text strings. A post-processor to a speech recognizer producing N-best hypotheses generates the word lattice representation, which is then augmented with information required for parsing. We will rst summarize the CDG parsing algorithm and then describe how the algorithm is extended to process the lattice on a single processor machine. Finally, we outline the CRCW P-RAM algorithm for parsing the word lattice, which requires O(n4) processors to parse in O(k + n) time. 1 Constraint Dependency Grammars To develop a syntactic analysis for a sentence using CDG, a constraint network (CN) of word nodes is constructed. Associated with each node is its position and a set of roles, which indicate the various functions the word lls in a sentence. Though two roles are required to write a grammar at least as expressive as a context-free grammar [2], examples depict a single role, governor, which represents the sentence function a word lls when governed by its head. Each role is initially assigned all role values allowed by the word's lexical category, where a role value consists of a label (the function the word can serve, e.g., SUBJ) and a modi ee (the number corresponding to the position of the word which it modi es, or nil). There are p q n = O(n) possible role values (where p, the number of roles per word, and q, the number of di erent labels, are grammatical constants and n is the number of modi ees or words) for each of the n words in the sentence, giving O(n2) role values altogether, which require O(n2) time to generate. Figure 1 shows the initialization of the role values for the sentence A sh eats.
منابع مشابه
Constraint satisfaction for robust parsing of spoken language
The eliminative nature of Constraint Satisfaction over nite domains o ers an interesting potential for robustness in the parsing of spoken language. An approach is presented, which puts unusually ambitious demands on the design of the Constraint Satisfaction procedure by trying to combine preferential reasoning, dynamic scheduling, parallel processing and incremental constraint solving within a...
متن کاملParsing Arabic Dialects
The Arabic language is a collection of spoken dialects with important phonological, morphological, lexical, and syntactic differences, along with a standard written language, Modern Standard Arabic (MSA). Since the spoken dialects are not officially written, it is very costly to obtain adequate corpora to use for training dialect NLP tools such as parsers. In this paper, we address the problem ...
متن کاملCross-Domain and Cross-Language Porting of Shallow Parsing
English was the main focus of attention of the Natural Language Processing (NLP) community for years. As a result, there are significantly more annotated linguistic resources in English than in any other language. Consequently, data-driven tools for automatic text or speech processing are developed mainly for English. Developing similar corpora and tools for other languages is an important issu...
متن کاملParsing and Subcategorization Data
In this paper, we compare the performance of a state-of-the-art statistical parser (Bikel, 2004) in parsing written and spoken language and in generating subcategorization cues from written and spoken language. Although Bikel’s parser achieves a higher accuracy for parsing written language, it achieves a higher accuracy when extracting subcategorization cues from spoken language. Our experiment...
متن کاملPARSEC: A Constraint-Based Parser for Spoken Language Processing
PARSEC1, a text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [26,27], is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and to efficiently process multiple sentence candidates that are likely to arise in spoken l...
متن کاملA Parallel Parser for Spoken Natural Language
This paper describes SYNAPSIS, a parser for performing real-time understanding of spoken utterances in a parallel computational environment. Understanding continuous speech allowing reasonably free syntax poses two main oroblems, namely the risk of erroneous interpretations and the largeness of the search space owing to the high uncertainty of the input. The parser is characterized by an approa...
متن کامل